A Lap Around Data Science

[@ashic]

Data Science

data-science-is

Data Scientist

data-scientist

Data Scientist

data-scientist

Kris Jack, Chief Data Scientist @ Mendeley

What is it?

data-science

data-science

Data Science

data-science

From Coursera’s Data Scientist’s Toolbox Course

Getting (and Cleaning) Data

data-science

Borat, on munging data

Exploratory Data Analysis

explore summarise visualise

Exploratory Data Analysis

Anscombe’s Quartet

data-science

Summaries of four datasets…

Exploratory Data Analysis

Anscombe’s Quartet

.. and their visualisations

Exploratory Data Analysis - Clustering

k-means

dbscan

Exploratory Data Analysis - Dimensionality Reduction

dimensionality-reduction

Reproducible Research

repro

Statistical Inference

inference

Machine Learning

supervised

unsupervised

Machine Learning - Regression

regression

gradient-descent

Machine Learning - Regression

sgd

Stochastic Gradient Descent

Machine Learning - Regression

higher-dimensions

Regression in Higher Dimensions

Machine Learning - Classification

classification

Machine Learning - Classification

classification-blunder

Machine Learning - Clustering

clustering

Machine Learning - Recommender Systems

recommender

Machine Learning - Recommender Systems

target

It might go wrong…

Machine Learning - Deep Learning

target

Developing Data Products

minard

Developing Data Products

minard-cool

Developing Data Products

r scikit d3 spark kafka cassandra hadoop accumulo

Learning Resources - Coursera

  • Master Statistics with R (Duke University)
    Open Intro Statistics
  • Data Science Specialization (10 Courses)
  • Mining of Massive Datasets (http://web.stanford.edu/class/cs246/)
    Used to be on Coursera. The accompanying book is very good.

Learning Resources - Stanford Online EdX

  • Statistical Learning (Stanford Online) by Hastie & Tibshirani
    Book: An Introduction to Statistical Learning.

  • EdX has various beginner level courses on Apache Spark.

Learning Resources - Udacity & Others

  • Nano-degrees on Data Analysis, Machine Learning, Self-Driving Cars, Predictive Analytics, etc.
  • DataStax Academy has many free online courses in various aspects of Cassandra.

Learning Resources - Books

open-intro statistical-learning mmds elements

Thanks